Session information

## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: Asia/Bangkok
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tidyr_1.3.1              magrittr_2.0.3           ape_5.8                 
##  [4] cluster_2.1.6            plotrix_3.8-4            scales_1.3.0            
##  [7] plyr_1.8.9               RColorBrewer_1.1-3       extrafont_0.19          
## [10] MALDIquantForeign_0.14.1 MALDIquant_1.22.2       
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.8.8           dplyr_1.1.4              compiler_4.4.0          
##  [4] tidyselect_1.2.1         Rcpp_1.0.12              parallel_4.4.0          
##  [7] jquerylib_0.1.4          yaml_2.3.8               fastmap_1.2.0           
## [10] lattice_0.22-6           R6_2.5.1                 generics_0.1.3          
## [13] knitr_1.47               XML_3.99-0.16.1          tibble_3.2.1            
## [16] munsell_0.5.1            readBrukerFlexData_1.9.2 pillar_1.9.0            
## [19] bslib_0.7.0              rlang_1.1.4              utf8_1.2.4              
## [22] cachem_1.1.0             Rttf2pt1_1.3.12          xfun_0.45               
## [25] sass_0.4.9               cli_3.6.2                digest_0.6.35           
## [28] grid_4.4.0               rstudioapi_0.16.0        base64enc_0.1-3         
## [31] lifecycle_1.0.4          nlme_3.1-165             vctrs_0.6.5             
## [34] readMzXmlData_2.8.3      evaluate_0.24.0          glue_1.7.0              
## [37] extrafontdb_1.0          fansi_1.0.6              colorspace_2.1-0        
## [40] purrr_1.0.2              rmarkdown_2.27           pkgconfig_2.0.3         
## [43] tools_4.4.0              htmltools_0.5.8.1

Principle of the CCI algorithm

Figure 1. Principle of the cross-correlation algorithm. (A) A typical raw MALDI-TOF mass spectrum of An. minimus, (B) the corresponding processed spectrum after intensity normalization and baseline removal, (C) comparison of the processed mass spectra of two An. minimus specimens over the 5000-5500 kDa mass interval showing high similarity between spectra, (D) comparison of the processed mass spectra of an An. minimus specimen and of an An. maculatus specimen over the 5000-5500 kDa mass interval showing limited similarity between spectra, (E) the cross-correlation function of the two An. minimus spectra over the 5000-5500 kDa mass interval gives a local maximum of 0.982 and (F) the cross-correlation function of the An. minimus and An. maculatus spectra over the 5000-5500 kDa mass interval gives a local maximum of 0.540. If no local maximum of the cross-correlation function is detected, the algorithm is parameterized to return 0. The resulting cross-correlation index on the log scale (log10CCI) over the 3000-12000 kDa mass range is -4.9 for the two An. minimus spectra and -Infinite for the An. minimus and An. maculatus spectra.

Samples collection and preparation

## [1] "Entomological surveys were carried out in 33 villages between 2020-11-09 and 2022-10-04"
## [1] "Mosquito samples were recieved in the laboratory 1 to 10 days after collection"
## [1] "Mosquito samples in the reference panel were processed in MALDI-TOF MS after 85 to 422 days of storage at -80°C"
## [1] "Mosquito samples in the test panel were processed in MALDI-TOF MS after 392 to 1069 days of storage at -20°C"

Panel composition

## [1] "403 Anopheles specimens were selected for inclusion in either the reference or the test panel (270 and 133 specimens, respectively)"
## [1] "254 specimens of the reference panel could be identified with PCR (ITS2: 32; COI: 24; ITS2+COI: 198)"
## [1] "105 specimens of the test panel could be identified with PCR (ITS2: 105; COI: 0; ITS2+COI: 0)"
## [1] "In total, 359 PCR-identified specimens were asigned to 26 taxa including 21 sensu stricto species and 5 sibling species pairs or complexes"
##  [1] "An. aconitus s.l."            "An. annularis s.l."          
##  [3] "An. baimaii"                  "An. campestris/wejchoochotei"
##  [5] "An. culicifacies s.l."        "An. dirus"                   
##  [7] "An. dissidens"                "An. dravidicus"              
##  [9] "An. interruptus"              "An. jamesii"                 
## [11] "An. jeyporiensis"             "An. karwari"                 
## [13] "An. kochi"                    "An. maculatus"               
## [15] "An. minimus"                  "An. nivipes"                 
## [17] "An. peditaeniatus"            "An. philippinensis"          
## [19] "An. pseudowillmori"           "An. saeungae"                
## [21] "An. sawadwongporni"           "An. sinensis"                
## [23] "An. splendidus"               "An. tessellatus s.l."        
## [25] "An. vagus"                    "An. varuna"
## [1] "2 specimens identified with COI-PCR as An. baimaii/dirus were excluded from the panel because other specimens of An. baimai and An. dirus identified with ITS2-PCR were available"
## [1] "4 specimens identified by morphology as Bariborstris Group were removed from the panel despite amplification of a ITS2 because the clean portion of the sequence was too short"
Table 1. Panel composition
subgenus group species reference test total
Anopheles Asiaticus An. interruptus 1 1 2
Anopheles Barbirostris An. campestris/wejchoochotei 1 0 1
Anopheles Barbirostris An. dissidens 24 0 24
Anopheles Barbirostris An. saeungae 4 0 4
Anopheles Hyrcanus An. peditaeniatus 3 2 5
Anopheles Hyrcanus An. sinensis 17 2 19
Cellia Annularis An. annularis s.l. 10 6 16
Cellia Annularis An. nivipes 13 3 16
Cellia Annularis An. philippinensis 4 1 5
Cellia Funestus An. aconitus s.l. 2 6 8
Cellia Funestus An. culicifacies s.l. 14 5 19
Cellia Funestus An. jeyporiensis 6 9 15
Cellia Funestus An. minimus 40 11 51
Cellia Funestus An. varuna 2 0 2
Cellia Jamesii An. jamesii 15 2 17
Cellia Jamesii An. splendidus 11 7 18
Cellia Kochi An. kochi 24 7 31
Cellia Leucosphyrus An. baimaii 12 6 18
Cellia Leucosphyrus An. dirus 0 2 2
Cellia Maculatus An. dravidicus 3 2 5
Cellia Maculatus An. maculatus 16 3 19
Cellia Maculatus An. pseudowillmori 10 3 13
Cellia Maculatus An. sawadwongporni 5 2 7
Cellia Subpictus An. vagus 13 8 21
Cellia Tessellatus An. tessellatus s.l. 2 9 11
Cellia Unclassified An. karwari 2 8 10

Constuction of the reference MSL

Characteristics of the reference MSL

## [1] "2535 mass spectra of the 254 reference Anopheles specimens identified with PCR were acquired, yielding 3211845 pairwise comparisons of distinct spectra pairs"

Repetability, reproducibility and specificity of mass spectra

## [1] "The median log10CCI was -7.9 (IQR: -9.2 to -6.8) for comparisons of technical replicates of the same specimen."
## [1] "The median log10CCI was -10.7 (IQR: -12.6 to -9.4) for comparisons of technical replicates of the same specimen."
## [1] "The median log10CCI was -Inf (IQR: -Inf to -Inf) for comparisons of technical replicates of the same specimen."

Figure 3. Repeatability, reproducibility and specificity of the mass spectra. (A) median log10CCI of pairwise comparisons between technical replicates of the same specimen collated by mass spectrum and (B) corresponding density function, (C) median log10CCI of pairwise comparisons between spectra of different specimens of the same species collated by mass spectrum and (D) corresponding density function, (E) median log10CCI of pairwise comparisons between spectra of different species collated by mass spectrum and (F) corresponding density function. Spectra with low intra-specimen (median log10CCI < 12, 39 spectra) and inter-specimen reproducibility (median log10CCI < 14, 115 spectra) are shown in orange in the panel A and C, respectively.

Heatmap grid

Figure 4. Heat map grid of the median cross-correlation index collated by specimen included in the reference mass spectra database. Red color on the diagonal shows the high reproducibility of mass spectra and the blue color out of the diagonal shows the high specificity of mass spectra. Orange color out of the central diagonal shows the high similarity between sibling species of the Barbirostris complex and some species of the Neomyzomyia series. Negative infinite values are showed in white.

Dendrogram

Figure S1. Dendrogram showing the output of hierarchical clustering analysis.

Best match analysis

## [1] "2477/2535 (97.7%) of the spectra included in the reference database matched with the same species (median log10CCI: -7.8; IQR: -8.8 to -7.0)"
## [1] "58/2535 (2.3%) of the spectra included in the reference database matched with another species"
## [1] "Among the mismatches, 19 were spectra of species represented by only one specimen and thus not included in the queried dataset because self-matching was disabled (median log10CCI: -13.4; IQR: -14.1 to -9.6), and 39 were true cross-matches between two referenced species (median log10CCI: -11.7; IQR: -13.6 to -9.8)."

Figure 5. Distribution of the maximum log10CCI value collated by spectra in bank-to-bank comparison by category of result, excluding comparisons between technical replicates of the same sample. (A) Correct matches with another specimen of the same species, (B) true cross-matches between a species referenced in the queried database and a different species, (C) species represented by only one specimen in the reference mass spectra database and therefore not included in the queried database because self-match was disabled.

Table 4. Distribution of CCI values in true cross-matches collated by species
species.in species.out n median q25 q75
An. dissidens An. saeungae 2 -8.029531 -8.566749 -7.492313
An. dravidicus An. maculatus 4 -9.730305 -9.995887 -9.570995
An. dravidicus An. minimus 1 -10.019018 -10.019018 -10.019018
An. dravidicus An. nivipes 1 -10.218182 -10.218182 -10.218182
An. dravidicus An. splendidus 1 -9.876309 -9.876309 -9.876309
An. karwari An. dissidens 1 -11.807625 -11.807625 -11.807625
An. karwari An. jamesii 1 -9.879874 -9.879874 -9.879874
An. kochi An. pseudowillmori 1 -16.999292 -16.999292 -16.999292
An. pseudowillmori An. maculatus 2 -11.817782 -11.861658 -11.773905
An. saeungae An. dissidens 5 -7.054183 -7.469164 -7.024720
An. sawadwongporni An. maculatus 8 -13.762174 -13.998237 -13.640210
An. sawadwongporni An. nivipes 2 -14.169721 -14.328012 -14.011430
An. tessellatus s.l. An. kochi 10 -12.176384 -13.076071 -10.951927
Table 5. List of queries of unreferenced species.
spectrum.in spectrum.out sample.in sample.out species.in species.out cci_log
207522 202201060900_1K1 202103050900_2I2 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.150524
210058 202201060900_1K2 202103050900_2I3 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.547698
204989 202201060900_1K3 202103050900_2I1 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -10.236264
220200 202201060900_1K4 202103050900_2J3 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.322730
225271 202201060900_1L1 202103050900_2K1 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.099370
204993 202201060900_1L4 202103050900_2I1 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.225623
210064 202201060900_2A1 202103050900_2I3 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -9.626956
222740 202201060900_2A2 202103050900_2J4 METF2_0031708 METF2_0006000 An. campestris/wejchoochotei An. dissidens -10.053820
5574122 202201060900_1L3 202201060900_2B2 METF2_0031708 METF2_0031712 An. campestris/wejchoochotei An. saeungae -10.225270
3161915 202103110900_2F3 202103290900_1H2 METF2_0018441 METF2_0009082 An. interruptus An. minimus -13.402686
2414091 202103110900_2F4 202103190900_3B4 METF2_0018441 METF2_0005907 An. interruptus An. minimus -13.853556
1453327 202103110900_2G1 202103101200_2B3 METF2_0018441 METF2_0013870 An. interruptus An. minimus -16.529050
548333 202103110900_2G2 202103080900_2C2 METF2_0018441 METF2_0005843 An. interruptus An. minimus -15.613003
2416629 202103110900_2G3 202103190900_3C1 METF2_0018441 METF2_0005907 An. interruptus An. minimus -13.825802
959005 202103110900_2G4 202103090900_1H2 METF2_0018441 METF2_0008991 An. interruptus An. minimus -14.449650
3156851 202103110900_2H1 202103290900_1G4 METF2_0018441 METF2_0009082 An. interruptus An. minimus -14.295210
2424237 202103110900_2H2 202103190900_3C4 METF2_0018441 METF2_0005907 An. interruptus An. minimus -13.944487
2416633 202103110900_2H3 202103190900_3C1 METF2_0018441 METF2_0005907 An. interruptus An. minimus -13.601965
2424239 202103110900_2H4 202103190900_3C4 METF2_0018441 METF2_0005907 An. interruptus An. minimus -14.568332
Table 6. Distribution of CCI values in queries of unreferenced species collated by species.
species.in species.out n median q25 q75
An. campestris/wejchoochotei An. dissidens 8 -9.435214 -9.733672 -9.206849
An. campestris/wejchoochotei An. saeungae 1 -10.225270 -10.225270 -10.225270
An. interruptus An. minimus 10 -14.119848 -14.538661 -13.832740
Table S3. Distribution of CCI values in concordant matches collated by species
species n.samples n.spectra median.log10.cci iqr.log10.cci
An. aconitus s.l. 2 20 -9.3 -9.7 to -8.9
An. annularis s.l. 10 100 -8.8 -9.2 to -8.3
An. baimaii 12 120 -7.9 -8.9 to -6.6
An. culicifacies s.l. 14 140 -8.0 -8.6 to -7.5
An. dissidens 24 238 -7.7 -8.7 to -7
An. dravidicus 3 23 -10.3 -10.8 to -10.1
An. jamesii 15 150 -7.8 -8.4 to -7.2
An. jeyporiensis 6 60 -8.0 -9.5 to -7.7
An. karwari 2 18 -10.5 -10.8 to -10.2
An. kochi 24 237 -7.7 -8.5 to -6.6
An. maculatus 16 160 -8.3 -9 to -7.4
An. minimus 40 400 -7.3 -8.2 to -6.5
An. nivipes 13 129 -7.1 -8.2 to -6.2
An. peditaeniatus 3 30 -9.6 -10.5 to -9.2
An. philippinensis 4 40 -8.9 -9.5 to -8.3
An. pseudowillmori 10 98 -7.3 -7.9 to -6.6
An. saeungae 4 35 -7.5 -7.9 to -7.2
An. sawadwongporni 4 40 -9.3 -10.3 to -8.8
An. sinensis 17 170 -7.1 -7.6 to -6.4
An. splendidus 11 109 -7.5 -8.3 to -7
An. tessellatus s.l. 2 10 -11.3 -11.6 to -11.1
An. vagus 13 130 -8.0 -8.6 to -7.3
An. varuna 2 20 -9.6 -10.2 to -9.3

Evaluation of the performances

Characteristics of the test panel

## [1] "1049 mass spectra of the 105 PCR-identified specimens included in the validation panel were queried against the reference database, yielding 2659215 pairwise comparisons"

ROC curve analysis

Figure 6. Evaluation of the performance of the reference mass spectra database for Anopheles species identification using the test panel. (A) Sensibility and specificity determined at varying identification threshold considering one spot per specimen, (B) corresponding receiving operator characteristics curve, (C) sensibility and specificity determined at varying identification threshold considering four spots per specimen and (D) corresponding receiving operator characteristics curve. The shaded areas in panels A and C show the 95% credible interval around the median estimate of 1000 simulations. The dashed line in panels B and D shows the performance of a random classification.

Table 2. Performance of Anopheles species identification with MALDI-TOF MS using the reference MSL.
threshold spot sensitivity2 specificity2 ppv2 accuracy2
109 -14 1 0.96 (0.92 to 0.99) 0.39 (0.32 to 0.46) 0.94 (0.92 to 0.96) 0.9 (0.87 to 0.94)
110 -14 2 1 (0.98 to 1) 0.27 (0.22 to 0.31) 0.94 (0.91 to 0.96) 0.93 (0.9 to 0.96)
112 -14 4 1 (1 to 1) 0.19 (0.16 to 0.23) 0.93 (0.91 to 0.95) 0.93 (0.91 to 0.95)
117 -14 9 1 (1 to 1) 0.14 (0.14 to 0.16) 0.92 (0.92 to 0.93) 0.92 (0.92 to 0.93)
127 -13 1 0.91 (0.86 to 0.95) 0.65 (0.58 to 0.71) 0.95 (0.93 to 0.97) 0.87 (0.83 to 0.9)
128 -13 2 0.96 (0.93 to 0.99) 0.51 (0.46 to 0.58) 0.95 (0.92 to 0.97) 0.91 (0.88 to 0.94)
130 -13 4 0.99 (0.97 to 1) 0.4 (0.36 to 0.45) 0.93 (0.91 to 0.96) 0.92 (0.9 to 0.95)
135 -13 9 1 (0.99 to 1) 0.3 (0.3 to 0.32) 0.92 (0.92 to 0.93) 0.92 (0.91 to 0.93)
145 -12 1 0.82 (0.77 to 0.87) 0.85 (0.8 to 0.9) 0.97 (0.94 to 0.99) 0.81 (0.75 to 0.86)
146 -12 2 0.89 (0.86 to 0.92) 0.78 (0.73 to 0.83) 0.96 (0.94 to 0.98) 0.86 (0.83 to 0.89)
148 -12 4 0.92 (0.89 to 0.94) 0.71 (0.68 to 0.76) 0.95 (0.94 to 0.97) 0.88 (0.85 to 0.9)
153 -12 9 0.95 (0.93 to 0.95) 0.62 (0.61 to 0.65) 0.94 (0.94 to 0.95) 0.9 (0.88 to 0.9)
163 -11 1 0.69 (0.63 to 0.75) 0.96 (0.93 to 0.99) 1 (0.97 to 1) 0.7 (0.63 to 0.74)
164 -11 2 0.79 (0.75 to 0.82) 0.93 (0.9 to 0.96) 0.99 (0.98 to 1) 0.79 (0.74 to 0.82)
166 -11 4 0.84 (0.81 to 0.87) 0.9 (0.88 to 0.92) 0.99 (0.98 to 1) 0.83 (0.8 to 0.87)
171 -11 9 0.89 (0.87 to 0.89) 0.86 (0.86 to 0.88) 0.98 (0.98 to 0.99) 0.88 (0.86 to 0.89)
181 -10 1 0.48 (0.42 to 0.53) 1 (1 to 1) 1 (1 to 1) 0.49 (0.43 to 0.54)
182 -10 2 0.59 (0.54 to 0.64) 1 (1 to 1) 1 (1 to 1) 0.6 (0.55 to 0.65)
184 -10 4 0.69 (0.65 to 0.72) 1 (1 to 1) 1 (1 to 1) 0.7 (0.66 to 0.72)
189 -10 9 0.75 (0.73 to 0.75) 1 (1 to 1) 1 (1 to 1) 0.75 (0.73 to 0.75)